3 Visualizing uncertainty

One of the most challenging aspects of data visualization is the visualization of uncertainty. When we see a data point drawn in a specific location, we tend to interpret it as a precise representation of the true data value. It is difficult to conceive that a data point could actually lie somewhere it hasn’t been drawn. Yet this scenario is ubiquitous in data visualization. Nearly every data set we work with has some uncertainty, and whether and how we choose to represent this uncertainty can make a major difference in how accurately our audience perceives the meaning of the data.

The most common approach to indicate uncertainty are error bars. The basic idea is to complement any kind of central measure (mean, median…) with an indicator of dispersion (sd, IQR…). To do so, we represent the central measure with a bar or point and add the error bars by adding and substracting the dispersion measure. Reporting uncertainty is key to properly understand data. Compare what happens when we just plot the mean:

fires %>%
    filter(BAREA>500) %>%
    group_by(MONTH) %>%
    summarise(Mean=mean(BAREA), SD=sd(BAREA)) %>%
    ggplot(aes(x=MONTH,y=Mean)) +
        geom_col() 

And the result when we account for uncertainty:

fires %>%
    filter(BAREA>500) %>%
    group_by(MONTH) %>%
    summarise(Mean=mean(BAREA), SD=sd(BAREA)) %>%
    ggplot(aes(x=MONTH,y=Mean)) +
        geom_col() +
        geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD))

The same can be done using dot plots:

fires %>%
    filter(BAREA>500) %>%
    group_by(MONTH) %>%
    summarise(Mean=mean(BAREA), SD=sd(BAREA)) %>%
    ggplot(aes(x=MONTH,y=Mean)) +
        geom_point() +
        geom_errorbar(aes(ymin=Mean-SD, ymax=Mean+SD))

Or even color:

fires %>%
    filter(BAREA>500) %>%
    group_by(MONTH) %>%
    summarise(Mean=mean(BAREA), SD=sd(BAREA)) %>%
    ggplot(aes(x=MONTH, y=Mean,fill=SD)) +
        geom_col() 

EXERCISE 2

Represent the relationship between tree height and diameter using the trees dataset. Explore potential differences among provinces or the most representative species.